Variable Selection in Large Environmental Data Sets Using Principal Components Analysis
نویسندگان
چکیده
In many large environmental datasets redundant variables can be discarded without the loss of extra variation. Principal components analysis can be used to select those variables that contain the most information. Using an environmental dataset consisting of 36 meteorological variables spanning 37 years, four methods of variable selection are examined along with dierent criteria levels for deciding on the number of variables to retain. Procrustes analysis, a measure of similarity and bivariate plots are used to assess the success of the alternative variable selection methods and criteria levels in extracting representative variables. The Broken-stick model is a consistent approach to choosing signi®cant principal components and is chosen here as the more suitable criterion in combination with a selection method that requires one principal component analysis and retains variables by starting with selection from the ®rst component. Copyright # 1999 John Wiley & Sons, Ltd.
منابع مشابه
Persian Handwriting Analysis Using Functional Principal Components
Principal components analysis is a well-known statistical method in dealing with large dependent data sets. It is also used in functional data for both purposes of data reduction as well as variation representation. On the other hand "handwriting" is one of the objects, studied in various statistical fields like pattern recognition and shape analysis. Considering time as the argument,...
متن کاملAnalysis of physiochemical and microbial quality of waters of the Karkheh River in southwestern Iran using multivariate statistical methods
Rapid population growth as well as agricultural and industrial development have increased the contamination of Iranian rivers. This study utilized principal components analysis (PCA) to determine the degree of significance of qualitative parameters of water resources in the Karkheh River in southwestern Iran. Cluster analysis (CA) grouped the monitoring stations based on the water quality data ...
متن کاملVariable Selection and Principal Component Analysis
In most of applied disciplines, many variables are sometimes measured on each individual, which result a huge data set consisting of large number of variables, say p [Sharma (1996)]. Using this collected data set in any statistical analysis may cause several troubles. The dimensionality of the data set can often be reduced, without disturbing the main features of the whole data set by Principal...
متن کاملFeature selection using genetic algorithm for classification of schizophrenia using fMRI data
In this paper we propose a new method for classification of subjects into schizophrenia and control groups using functional magnetic resonance imaging (fMRI) data. In the preprocessing step, the number of fMRI time points is reduced using principal component analysis (PCA). Then, independent component analysis (ICA) is used for further data analysis. It estimates independent components (ICs) of...
متن کاملTwo-stage Variable Clustering for Large Data Sets
In data mining, principal component analysis is a popular dimension reduction technique. It also provides a good remedy for the multicollinearity problem, but its interpretation of input space is not as good. To overcome the interpretation problem, principal components (cluster components) are obtained through variable clustering, which was implemented with PROC VARCLUS. The procedure uses obli...
متن کامل